Methods of Data Analysis Working with probability distributions
ثبت نشده
چکیده
One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution, assuming we are given as data a finite sample from that distribution. Obviously this problem is ill-posed for continuous distributions: with finite data, there is no way to distinguish between (or exclude) distributions that are not restricted to be smooth. The question then becomes of how to formulate the problem where a finite data set can be used to choose the “best” distribution among the distributions that are “sufficiently smooth”, where, intuitively, more data should allow us to consider less smooth distributions. A popular way of constructing such continuous distribution estimates is by kernel density estimation (KDE). Another method is to describe the data with a parametric family of distributions that is sufficiently rich so that, in the limit of large data, it can describe an arbitrary distribution. A well-known example from this class are Mixtures of Gaussians. Another versatile framework for modeling both continuous and discrete distributions of potentially high dimensionality given a finite sample is by means of maximum entropy (ME) approach. Here, we are looking for the most random distribution (= maximum entropy) that exactly reproduces a chosen set of statistics which can reliably be estimated from data. The assumption of maximum entropy is a formal version of Occam’s razor: one chooses distributions of a particular form that contain a minimal amount of structure that is nevertheless sufficient to explain selected aspects of observations (constraints).
منابع مشابه
Flood Flow Frequency Model Selection Using L-moment Method in Arid and Semi Arid Regions of Iran
Statistical frequency analysis is the most common procedure for the analysis of flood data at a gauged location thatin first step it is needed to select a model to represent the population. Among them, the central moment has been themost common and widely used, and with the using of computers, the application of the maximum likelihood hasincreased. This research was carried out in order to reco...
متن کاملLow flow frequency analysis by L-moments method (Case study: Iranian Central Plateau River Basin)
Knowledge about low flow statistics is essential for effective water resource planning and management in ungauged orpoorly gauged catchment areas, especially in arid and semi-arid regions such as Iran. We employed a data set of 20 riverflow time-series from the Iranian Central Plateau River Basin, Iran to evaluate the low-flow series using several frequencyanalysis methods and compared the resu...
متن کاملSPREAD DATA ANALYSIS OF ALUMINUM OXIDE SPLATS REINFORCED WITH CARBON NANOTUBES
Coating of a surface by droplet spreading plays an important role in many novas industrial processes, such as plasma spray coating, ink jet printing, nano safeguard coatings and nano self-assembling. Data analysis of nano and micro droplet spreading can be widely used to predict and optimize coating processes. In this article, we want to select the most appropriate statistical distribution for ...
متن کاملProbability Distribution Fitting to Maternal Mortality in Nigeria.
The consequences of Maternal Mortality (MM) cannot be overemphasized. It inhibits population growth resulting into loss of lives among others. This work tends to obtain the maternal mortality rates (MMR) in Nigeria, identify some fitted distributions to MMR and determine which of the distributions best fits the data. A comprehensive Exploratory Data Analysis (EDA) was carried on MM and the MMRs...
متن کاملFrequency Analysis of Maximum Daily Rainfall in various Climates of Iran
In this research in order to frequency analysis of maximum daily rainfall in various climates of Iran the data of 40 synoptic rain gauges collected in 40 years period i.e., 1973 to 2012 were used. These stations are located in various climates of Iran according to De Martonne climatic classification. At first, input of data to HYFA package was performed. The mentioned package includes seven...
متن کاملA Statistical Analysis of the Aircraft Landing Process
Managing operations of the aircraft approach process and analyzing runway landing capacity, utilization and related risks require detailed insight into the stochastic characteristics of the process. These characteristics can be represented by probability distributions. The focus of this study is analyzing landings on a runway operating independent of other runways making it as a single runway. ...
متن کامل